Inner TRIM3 Masthead

Statistical Match Procedure Used in the 2007-2009 Baselines

The statistical match procedure used in the 2007-2009 baselines is an unconstrained nearest neighbor match similar to that used in 2005-2006, but including additional improvements to the matching methodology that change the block groups and minimum distance function. Prior to matching, the CPS and PUF are divided into mutually exclusive groups that only allow matching within each respective group. The groups are defined by the following "blocking variables":

  • filing status - whether the taxpayer files a joint or non-joint return
  • Social Security receipt - whether the tax unit receives Social Security income
  • dependent children - the number of dependent children of the taxpayer living within the household (none, one, or two or more)
  • dependency status - whether the taxpayer can be claimed as a dependent on another return
Certain block groups are collapsed--tax units with Social Security income are subdivided into blocks defined by filing status (joint/non joint) but are not differentiated by dependent status or number of dependent children, and dependent returns without Social Security income are treated a single block (i.e., they are not differentiated by filing status and because they cannot themselves claim dependents, there is no additional blocking by number of dependent children).

Several additional constraints are imposed on the matching algorithm that have the effect of reducing the number of PUF records that are potential matches to a particular TRIM3 record. These constraints relate to:

  • Capital Gains and Transfer Program Recipients. The statistical match does not assign capital gains to tax units receiving SSI, TANF, public or subsidized housing, or food stamp benefits.
  • Home Ownership. A TRIM3 tax unit must own a house in order to be matched with a PUF tax unit that claims itemized deductions for home mortgage interest expenses or real estate taxes.
  • State and Local Tax Deductions. A TRIM3 tax unit in a state without a state income tax can only be matched to a PUF record claiming the state and local income tax deduction if the PUF tax unit is also in a state without a state income tax.
  • Adjustments for Keogh/SEP Contributions. A TRIM3 tax unit must have business or farm self-employment income in order to be matched with an PUF tax unit that claims adjustments to income for contributions to Keogh or SEP retirement accounts.
  • Child and Dependent Care Expenses. A TRIM3 tax unit must have qualifying child care expenses to be matched to a PUF tax unit that claims child and dependent care expenses.
  • PUF Variables Exceeding Prescribed Levels. A single large value for an PUF variable can produce skewed results if the PUF record in question represents only a few tax units but is matched to a CPS tax record representing many tax units. To avoid this problem, the match procedure disallows matches to PUF records with very large values for certain variables.
  • Earner/Non-Earner Status. The PUF record and TRIM3 tax unit must have the same earner/non-earner status. A tax unit is classified as an "earner" if total wages, business, and farm income is non-zero.
  • Asset Related Income. If the TRIM3 tax unit has asset related income (interest, rent and royalties, or dividends) then it is only matched to a PUF return with asset related income.
  • Pension Income. If the CPS tax unit has pension income, then it is only matched to a PUF return with pension income. If the CPS tax unit does not have pension income and the PUF return has pension income, then it is only matched to the PUF record if the head of the CPS tax unit is age 55 or older.

The 2007 baseline introduced a new "minimum distance" function for matching tax units, depending on whether each unit had a top-coded income amount. It also continued the practice of using the PUF to restore variation to top-coded CPS incomes. During this era of the CPS, the Census Bureau top-coded income amounts exceeding certain thresholds in order to preserve confidentiality, and replaced top-coded amounts with averages calculated for all top-coded individuals. The replacement values used for earned income variables varied by gender, race/ethnicity, and whether the person worked full-time for the full-year. One goal of the statistical match was to increase variation in income amounts over the threshold, allowing for more precise calculation of taxes.

Once the match procedure has identified the set of PUF records that can be matched to a given CPS tax unit, a PUF record is selected using a "minimum distance" function. This procedure varies between units that are "high income" (that is, with one or more income amounts above the top-coding threshold--see below) and lower income units. If the tax unit is not treated as a "high income" unit for the purpose of the match, then the distance function is computed based on AGI. Capital gains and IRA and Keogh contributions are obtained from the PUF record being considered for the match. The capital gains are added (and IRA and Keogh contributions are subtracted) from the preliminary AGI calculated by TRIM3. The resulting AGI is compared to the AGI of the available PUF records and the record with the least difference in AGI is selected.

The statistical match restores variation to the following top-coded CPS income variables: wages, business income, farm income, interest, pensions, a combined variable representing dividends, estates, and trusts, and a combined variable representing rents and royalties. If a tax-unit has a top-coded value for one or more of these variables (meaning it is a "high income" unit), the minimum distance function is computed by examining the difference between the CPS tax unit and the PUF record for each of ten income items reported on both the CPS and the PUF (wages, business income, farm income, interest, pensions, dividends/estates/trusts, rents/royalties, total social security benefits, unemployment compensation, and alimony received). However, for each top-coded income source, the matching algorithm:

  • Removes the top-coded income variable from the distance function.
  • Imposes the additional constraint that only PUF records with values in excess of the censoring point (the point at which CPS top-coding begins) are eligible to be matched to the TRIM3 tax unit.
The PUF record with the least absolute difference across these income items is then selected as a match.

Once a PUF record has been selected, variables from that record are assigned to the CPS tax unit. The weight of the PUF record is then reduced by the weight of the CPS tax unit. Once the weight for a PUF record has been reduced to zero, it cannot be matched to additional CPS tax units.

When a PUF record is matched to a top-coded tax unit, we replace the CPS income amount (for any top-coded variables) with the amount obtained from the PUF record, restoring variation to the top-coded income variable. The modified income variables do not overwrite the CPS income variables stored in the TRIM3 database, but are stored as a set of alternative income variables for use as input to the Federal Tax simulation. In particular, the baselines labeled "highinc" use the PUF income values, while the regular baselines use the CPS income values.

Because the variables obtained through the statistical match for an individual tax unit are obtained from a single PUF record, we are limited in our ability to align any specific variable to target. However, we do make some adjustments. We adjust the capital gains and deduction dollar amounts to reflect the change in average dollar amounts between the year of the PUF data and the tax year being simulated, and we make minor adjustments to increase or decrease the likelihood of selecting a PUF record based on whether the record has income or deduction values from particular sources (such as capital gains). We also perform some minimal alignment by adjusting the dollar amounts used to disallow matches to PUF records with very large income or deduction amounts.

The 2007-2009 baselines used the 2006 PUF.